APANPS5420 Assignment 1 - Modern Time Series Assignment¶

Rake Anggoro (ma4716)¶


Executive Summary

This time series analysis project investigates patterns and forecasts of monthly U.S. flight arrival delays using publicly available data from top airline carriers and airports. The objective is to understand delay trends, identify seasonal and external disruptors, and evaluate the forecasting performance of several time series models.

Key Steps and Highlights:

  • Data Preparation: Cleaned and structured 400K+ flight performance records by month, carrier, and airport from 2005 to 2025. Supplementary features were engineered, including holiday counts, travel season flags, and macro-disruptions (COVID-19, natural disasters, economic shocks).

  • Exploratory Analysis: Visualized and quantified delays by month, season, carrier, and airport. Key patterns revealed spikes in delay rates during major holidays and notable disruptions in 2020 (COVID-19 impact).

  • Modeling Approaches Tested:

    • Simple Moving Average (SMA): Captured basic seasonality and provided a baseline forecast.
    • Exponential Moving Average (EMA): Introduced more responsiveness to recent trends.
    • Seasonal-Trend Decomposition (STD): Effectively separated trend and seasonal components, showing high accuracy.
    • Facebook Prophet: Applied with and without external regressors (e.g., holidays, disasters). It showed improvements when enriched with features but still underperformed compared to STD in this context.

Performance Summary:

Model MAPE (%) MSE % Points Outside Bounds
SMA 28.66 1.09×10¹² 31.09%
EMA 36.84 1.47×10¹² 37.60%
STD 21.00 4.62×10¹¹ 10.87%
Prophet (default) 39.40 1.15×10¹² 14.05%
Prophet (w/ regressors) 28.12 7.11×10¹¹ 17.77%

Key Takeaways:

  • STD outperformed more complex models in both accuracy and robustness for in-sample forecasting.
  • Prophet’s performance improved significantly with external features, though still not surpassing STD for this dataset.
  • Major external events (COVID-19, hurricanes, economic disruptions) play a clear role in delay patterns and should be included in any serious forecasting pipeline.

This project demonstrates the power of combining domain-driven feature engineering with a comparative approach to time series modeling, enabling more informed operational planning in the aviation sector.

Load Packages¶

In [1]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
import plotly.graph_objects as go
from statsmodels.tsa.api import SimpleExpSmoothing
import statsmodels.api as sm
from plotly.subplots import make_subplots
from pandas.tseries.holiday import USFederalHolidayCalendar

import numpy as np
np.float_ = np.float64 # to avoid prophet error of np.float_` was removed in the NumPy 2.0 release
from prophet import Prophet
from prophet.plot import plot_plotly, plot_components_plotly

from sklearn.metrics import mean_absolute_percentage_error, mean_squared_error

Set Config¶

In [2]:
data_dir = "/Users/mrla/Documents/Projects/data/flight_delays/ot_delaycause1_DL"
pd.set_option('display.max_columns', None)

# Use Seaborn's white theme with bold axes and soft gridlines
sns.set_theme(style="ticks", context="talk", palette="muted")

def plot_line(data, x, y, title, ylabel, figsize=(14, 6), hue=None):
    plt.figure(figsize=figsize)
    ax = sns.lineplot(data=data, x=x, y=y, hue=hue, linewidth=2.5)
    ax.set_title(title, fontsize=18, fontweight='bold')
    ax.set_ylabel(ylabel)
    ax.set_xlabel('')
    ax.tick_params(axis='x', rotation=45)
    sns.despine()  # removes top and right spines
    plt.tight_layout()
    plt.show()

Load Data¶

This dataset, published by the U.S. Department of Transportation’s Bureau of Transportation Statistics (BTS), contains airline on-time performance and delay cause metrics for from June 2003 until February 2025. It includes detailed breakdowns of delays by category—carrier, weather, NAS (National Aviation System), security, and late-arriving aircraft—alongside data on cancellations and diversions. The information is publicly available via the BTS TranStats portal and supports analysis of trends in U.S. aviation operations.

In [3]:
df = pd.read_csv(data_dir + "/Airline_Delay_Cause.csv")

Data Definitions (Source)¶

Column Name Column Definition
year YYYY format
month MM format (1-12)
carrier Code assigned by US DOT to identify a unique airline carrier.
carrier_name Unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation.
airport A three-character alpha-numeric code issued by the U.S. Department of Transportation, which is the official designation of the airport.
airport_name A place from which aircraft operate that usually has paved runways and maintenance facilities and often serves as a terminal.
arr_flights Arrival Flights
arr_del15 Arrival Delay Indicator (15 Minutes or More). Arrival delay equals the difference between the actual arrival time and the scheduled arrival time. A flight is considered on-time when it arrives less than 15 minutes after its published arrival time.
carrier_ct Carrier Count for airline cause of delay
weather_ct Weather Count for airline cause of delay
nas_ct NAS (National Air System) Count for airline cause of delay
security_ct Security Count for airline cause of delay
late_aircraft_ct Late Aircraft Delay Count for airline cause of delay
arr_cancelled Flight cancelled
arr_diverted Flight diverted
arr_delay Difference in minutes between scheduled and actual arrival time. Early arrivals show negative numbers.
carrier_delay Carrier Delay, in Minutes
weather_delay Weather Delay, in Minutes
nas_delay National Air System Delay, in Minutes
security_delay Security Delay, in Minutes
late_aircraft_delay Late Aircraft Delay, in Minutes

Data Exploration¶

High-level Data Summary¶

In [4]:
print(f"Shape of data: {df.shape[0]:,} rows, {df.shape[1]:,} columns")
print(f"Columns in data: {df.columns.tolist()}")
print(f"First 5 rows of data:\n{df.head()}")
Shape of data: 400,118 rows, 21 columns
Columns in data: ['year', 'month', 'carrier', 'carrier_name', 'airport', 'airport_name', 'arr_flights', 'arr_del15', 'carrier_ct', 'weather_ct', 'nas_ct', 'security_ct', 'late_aircraft_ct', 'arr_cancelled', 'arr_diverted', 'arr_delay', 'carrier_delay', 'weather_delay', 'nas_delay', 'security_delay', 'late_aircraft_delay']
First 5 rows of data:
   year  month carrier       carrier_name airport  \
0  2025      2      9E  Endeavor Air Inc.     ABE   
1  2025      2      9E  Endeavor Air Inc.     AEX   
2  2025      2      9E  Endeavor Air Inc.     AGS   
3  2025      2      9E  Endeavor Air Inc.     ALB   
4  2025      2      9E  Endeavor Air Inc.     ATL   

                                        airport_name  arr_flights  arr_del15  \
0  Allentown/Bethlehem/Easton, PA: Lehigh Valley ...         78.0        9.0   
1           Alexandria, LA: Alexandria International         78.0       12.0   
2        Augusta, GA: Augusta Regional at Bush Field         91.0       13.0   
3                   Albany, NY: Albany International         56.0       12.0   
4  Atlanta, GA: Hartsfield-Jackson Atlanta Intern...       2700.0      416.0   

   carrier_ct  weather_ct  nas_ct  security_ct  late_aircraft_ct  \
0        5.52        0.52    2.84         0.00              0.12   
1        5.77        1.62    1.75         0.00              2.86   
2        2.47        0.93    4.25         0.00              5.35   
3        4.34        0.34    5.32         0.00              2.00   
4       83.87       16.73  143.26         0.16            171.98   

   arr_cancelled  arr_diverted  arr_delay  carrier_delay  weather_delay  \
0            0.0           0.0      733.0          578.0           16.0   
1            0.0           0.0      803.0          379.0           75.0   
2            0.0           0.0      964.0          101.0          507.0   
3            2.0           1.0      761.0          246.0           35.0   
4           24.0           3.0    33668.0        10723.0         1790.0   

   nas_delay  security_delay  late_aircraft_delay  
0      102.0             0.0                 37.0  
1       92.0             0.0                257.0  
2      197.0             0.0                159.0  
3      239.0             0.0                241.0  
4     6851.0            15.0              14289.0  
In [5]:
print(f"Data types of columns:\n{df.dtypes}")
print("--------------------------------")
missing_counts = df.isnull().sum()
missing_percent = (missing_counts / len(df)) * 100

missing_df = pd.DataFrame({
    'Missing Count': missing_counts,
    'Missing %': missing_percent.round(2)
})

print("Missing values in each column:\n", missing_df)

print("--------------------------------")
print(f"Summary statistics of numerical columns:\n{df.describe()}")
Data types of columns:
year                     int64
month                    int64
carrier                 object
carrier_name            object
airport                 object
airport_name            object
arr_flights            float64
arr_del15              float64
carrier_ct             float64
weather_ct             float64
nas_ct                 float64
security_ct            float64
late_aircraft_ct       float64
arr_cancelled          float64
arr_diverted           float64
arr_delay              float64
carrier_delay          float64
weather_delay          float64
nas_delay              float64
security_delay         float64
late_aircraft_delay    float64
dtype: object
--------------------------------
Missing values in each column:
                      Missing Count  Missing %
year                             0       0.00
month                            0       0.00
carrier                          0       0.00
carrier_name                     0       0.00
airport                          0       0.00
airport_name                     0       0.00
arr_flights                    657       0.16
arr_del15                      951       0.24
carrier_ct                     657       0.16
weather_ct                     657       0.16
nas_ct                         657       0.16
security_ct                    657       0.16
late_aircraft_ct               657       0.16
arr_cancelled                  657       0.16
arr_diverted                   657       0.16
arr_delay                      657       0.16
carrier_delay                  657       0.16
weather_delay                  657       0.16
nas_delay                      657       0.16
security_delay                 657       0.16
late_aircraft_delay            657       0.16
--------------------------------
Summary statistics of numerical columns:
                year          month    arr_flights      arr_del15  \
count  400118.000000  400118.000000  399461.000000  399167.000000   
mean     2014.471181       6.502764     361.357727      69.552861   
std         6.499727       3.468805     993.307701     193.465420   
min      2003.000000       1.000000       1.000000       0.000000   
25%      2009.000000       3.000000      55.000000       8.000000   
50%      2015.000000       7.000000     113.000000      21.000000   
75%      2020.000000      10.000000     254.000000      52.000000   
max      2025.000000      12.000000   21977.000000    6377.000000   

          carrier_ct     weather_ct         nas_ct    security_ct  \
count  399461.000000  399461.000000  399461.000000  399461.000000   
mean       20.529071       2.508470      22.195918       0.172298   
std        48.302099       9.566042      79.382261       0.824913   
min         0.000000       0.000000      -0.010000       0.000000   
25%         2.780000       0.000000       1.400000       0.000000   
50%         7.500000       0.510000       4.910000       0.000000   
75%        18.630000       2.000000      13.980000       0.000000   
max      1886.580000     717.940000    4091.270000      80.560000   

       late_aircraft_ct  arr_cancelled   arr_diverted      arr_delay  \
count     399461.000000  399461.000000  399461.000000  399461.000000   
mean          24.095947       6.778667       0.833108    4168.764791   
std           74.351714      35.028071       3.786252   12756.159221   
min            0.000000       0.000000       0.000000       0.000000   
25%            1.520000       0.000000       0.000000     405.000000   
50%            5.490000       1.000000       0.000000    1140.000000   
75%           16.140000       4.000000       1.000000    2974.000000   
max         2588.130000    4951.000000     256.000000  648300.000000   

       carrier_delay  weather_delay      nas_delay  security_delay  \
count  399461.000000  399461.000000  399461.000000   399461.000000   
mean     1310.001404     223.720794    1025.811278        7.133262   
std      3829.563198     884.316576    4330.445982       39.070721   
min         0.000000       0.000000     -19.000000        0.000000   
25%       134.000000       0.000000      49.000000        0.000000   
50%       410.000000      22.000000     183.000000        0.000000   
75%      1075.000000     156.000000     555.000000        0.000000   
max    321792.000000   64550.000000  238440.000000     3760.000000   

       late_aircraft_delay  
count        399461.000000  
mean           1602.090805  
std            5154.067020  
min               0.000000  
25%              76.000000  
50%             342.000000  
75%            1089.000000  
max          279153.000000  

Insights so far

Understanding *_ct and *_delay Columns

🔹 *_ct Columns (e.g., carrier_ct, weather_ct)

  • Despite the name, these are not raw integer counts of delays.

  • They contain decimal values, indicating they are likely:

    • Weighted averages (e.g., delays per flight or per day)
    • Proportional allocations of delay causes
  • These fields are typically found in aggregated FAA datasets where counts are normalized.

  • Important: We cannot sum the *_ct values expecting to get total delayed flights (arr_del15).

🔹 *_delay Columns (e.g., carrier_delay, weather_delay)

  • These represent delay durations in minutes.

  • In principle, their sum should match arr_delay:

    arr_delay ≈ carrier_delay + weather_delay + nas_delay + security_delay + late_aircraft_delay
    
  • In practice, small mismatches may exist due to:

    • Rounding
    • Missing values
    • Unattributed or jointly caused delays

In the next code section, we will check if the delay components add up to arr_delay and explore any mismatches.

In [6]:
# Step 1: Safely compute delay component sum by treating NaNs as 0
delay_components = (
    df['carrier_delay'].fillna(0) +
    df['weather_delay'].fillna(0) +
    df['nas_delay'].fillna(0) +
    df['security_delay'].fillna(0) +
    df['late_aircraft_delay'].fillna(0)
)

# Step 2: Compute difference from arr_delay (only where arr_delay is not null)
valid_mask = df['arr_delay'].notna()
delay_difference = df.loc[valid_mask, 'arr_delay'] - delay_components[valid_mask]

# Step 3: Identify rows where the difference is significant (e.g., > 1 minute)
mismatched_rows = df.loc[valid_mask].loc[delay_difference.abs() > 1]

# Step 4: Summary
print(f"Total valid rows (non-null arr_delay): {valid_mask.sum()}")
print(f"Rows where delay components do NOT sum to arr_delay: {len(mismatched_rows)} ({(len(mismatched_rows) / len(df)) * 100:.2f}%)")

print("\nSample mismatches:")
print(
    mismatched_rows[['arr_delay']].assign(
        delay_components=delay_components[valid_mask],
        difference=delay_difference
    ).head()
)
Total valid rows (non-null arr_delay): 399461
Rows where delay components do NOT sum to arr_delay: 9 (0.00%)

Sample mismatches:
       arr_delay  delay_components  difference
13845    39334.0           39154.0       180.0
13848    12812.0           12686.0       126.0
25479     2773.0            1428.0      1345.0
28886     6934.0            6880.0        54.0
31638     3006.0            2775.0       231.0

Out of 399,461 valid records (i.e., rows where arr_delay is not null), only 9 rows—or less than 0.01%—have a mismatch between the sum of individual delay components (carrier_delay, weather_delay, nas_delay, security_delay, late_aircraft_delay) and the reported arr_delay. This confirms that the delay decomposition is highly reliable across the dataset. The mismatches are relatively small in scale, with the largest observed difference being 1,345 minutes. These discrepancies could be due to rounding, reporting errors, or rare cases of unclassified delays.

Overall, the dataset's delay attribution can be considered consistent and trustworthy for modeling time series and anomaly detection.

Yearly Stats¶

In [7]:
import pandas as pd

# Add a 'season' column to the original DataFrame
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

df['season'] = df['month'].apply(get_season)

# Yearly aggregated DataFrame
yearly_df = (
    df.groupby('year')
    .agg(
        n_airports=('airport', pd.Series.nunique),
        n_carriers=('carrier', pd.Series.nunique),
        total_arr_flights=('arr_flights', 'sum'),
        pct_delayed=('arr_del15', lambda x: x.sum() / df.loc[x.index, 'arr_flights'].sum()),
        avg_arr_delay=('arr_delay', 'mean'),
        avg_carrier_delay=('carrier_delay', 'mean'),
        avg_weather_delay=('weather_delay', 'mean'),
        avg_nas_delay=('nas_delay', 'mean'),
        avg_security_delay=('security_delay', 'mean'),
        avg_late_aircraft_delay=('late_aircraft_delay', 'mean')
    )
    .reset_index()
)

# Seasonal aggregated DataFrame
seasonal_df = (
    df.groupby(['year', 'season'])
    .agg(
        n_airports=('airport', pd.Series.nunique),
        n_carriers=('carrier', pd.Series.nunique),
        total_arr_flights=('arr_flights', 'sum'),
        pct_delayed=('arr_del15', lambda x: x.sum() / df.loc[x.index, 'arr_flights'].sum()),
        avg_arr_delay=('arr_delay', 'mean'),
        avg_carrier_delay=('carrier_delay', 'mean'),
        avg_weather_delay=('weather_delay', 'mean'),
        avg_nas_delay=('nas_delay', 'mean'),
        avg_security_delay=('security_delay', 'mean'),
        avg_late_aircraft_delay=('late_aircraft_delay', 'mean')
    )
    .reset_index()
)

# Optional: Ensure natural season order
season_order = ['Winter', 'Spring', 'Summer', 'Fall']
seasonal_df['season'] = pd.Categorical(seasonal_df['season'], categories=season_order, ordered=True)
seasonal_df = seasonal_df.sort_values('season')
In [8]:
# Plot 1: Number of Flights
plot_line(yearly_df, x='year', y='total_arr_flights',
          title='Number of Flights per Year', ylabel='Flights')

plot_line(seasonal_df, x='year', y='total_arr_flights', hue='season',
          title='Number of Flights per Year by Season', ylabel='Flights')

# Plot 2: Number of Airports
plot_line(yearly_df, x='year', y='n_airports',
          title='Number of Airports per Year', ylabel='Airports')

plot_line(seasonal_df, x='year', y='n_airports', hue='season',
          title='Number of Airports per Year by Season', ylabel='Airports')

# Plot 3: Number of Carriers
plot_line(yearly_df, x='year', y='n_carriers',
          title='Number of Carriers per Year', ylabel='Carriers')

plot_line(seasonal_df, x='year', y='n_carriers', hue='season',
          title='Number of Carriers per Year by Season', ylabel='Carriers')

# Plot 4: Percentage of Flights Delayed
plot_line(yearly_df, x='year', y='pct_delayed',
          title='Percentage of Flights Delayed per Year', ylabel='Percent Delayed')

plot_line(seasonal_df, x='year', y='pct_delayed', hue='season',
          title='Percentage of Flights Delayed per Year by Season', ylabel='Percent Delayed')

# Plot 5: Average Arrival Delay
plot_line(yearly_df, x='year', y='avg_arr_delay',
          title='Average Arrival Delay per Year', ylabel='Avg Delay (minutes)')

plot_line(seasonal_df, x='year', y='avg_arr_delay', hue='season',
          title='Average Arrival Delay per Year by Season', ylabel='Avg Delay (minutes)')
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Some insights so far:

  • Data prior to 2005

    • The dataset is incomplete for years before 2005, with only a few carriers represented.
    • Data prior to this will be excluded
  • Impact of Covid-19

    • The data in 2020 seems to be heavily impacted by the Covid-19 pandemic, with significant drops in flight activity and delays.
    • Data in 2023 seems to have returned to pre-pandemic levels, but with some lingering effects.
    • Data in 2025 is incomplete therefore will not be included in this analysis.

However, we can see that there is some anomaly in the number of airports and carriers in 2018. The spike seems too high and it warrants further investigation.

Flights of Carriers per year¶

In [9]:
# Pivot table with carriers on rows and years on columns
carrier_year_matrix = (
    df.groupby(['carrier', 'carrier_name', 'year'])
    .size()
    .reset_index(name='n_rows')
    .pivot_table(
        index=['carrier', 'carrier_name'],
        columns='year',
        values='n_rows',
        fill_value=0
    )
)

# Create a readable label: "XX – Carrier Name"
carrier_year_matrix.index = carrier_year_matrix.index.map(
    lambda x: f"{x[0]} – {x[1]}"
)

# Create heatmap
plt.figure(figsize=(10, int(len(carrier_year_matrix) * 0.4)))
sns.heatmap(
    carrier_year_matrix,
    cmap="YlGnBu",
    linewidths=0.5,
    linecolor='gray',
    cbar_kws={'label': 'Row Count',
              'shrink': 0.5,
              'aspect': 20,
                     
              },
    square=False,
    label=True,
    
)
plt.title("Carrier Activity by Year", fontsize=16, fontweight='bold')
plt.xlabel("Year")
plt.ylabel("Carrier")
plt.tick_params(axis='y', labelsize=8)
plt.tick_params(axis='x', labelsize=8)
plt.tight_layout()
plt.show()
No description has been provided for this image

The dataset reveals several structural inconsistencies that should be considered before conducting trend analysis or forecasting:

  • Missing Historical Presence

    • Some long-established carriers (e.g., Piedmont Airlines) only appear in recent years.
    • This likely reflects changes in reporting scope or operational status (e.g., reactivation after mergers).
  • Carrier Code vs. Carrier Name Mismatch

    • The same carrier code is reused over time but tied to different carrier names.

      • Example: "EV" appears as both ExpressJet Airlines Inc. and ExpressJet Airlines LLC.
    • This creates ambiguity in time series analysis, especially when aggregating or tracking carrier performance longitudinally.

  • Irregular or Short-Lived Carriers

    • Some carriers operate only for limited periods or on seasonal routes (e.g., Endeavor Air Inc, Peninsula Airways, Mesa Airlines Inc.).
    • These entries can distort long-term trends if not filtered or labeled appropriately.

For this analysis, we will:

  • Combine the carrier names filghts into the same carrier code
  • Focus only on carriers with a consistent presence across the dataset, e.g. Delta Airlines, JetBlue, United, etc.

Flights in Each Airport per Year¶

In [10]:
# Pivot table with airports on rows and years on columns
airport_year_matrix = (
    df.groupby(['airport', 'airport_name', 'year'])
    .size()
    .reset_index(name='n_rows')
    .pivot_table(
        index=['airport', 'airport_name'],
        columns='year',
        values='n_rows',
        fill_value=0
    )
)

# Create a readable label: "XXX – Airport Name"
airport_year_matrix.index = airport_year_matrix.index.map(
    lambda x: f"{x[0]} – {x[1]}"
)

# Transpose the matrix: now years = rows, airports = columns
airport_year_matrix = airport_year_matrix.T

fig = px.imshow(
    airport_year_matrix,
    labels=dict(x="Airport", y="Year", color="Row Count"),
    aspect="auto",
    color_continuous_scale="YlGnBu"
)

fig.update_layout(
    width=3000,  # adjust as needed
    height=600,
    title="Airport Activity by Year",
    xaxis_tickangle=45,
    xaxis=dict(
        tickfont=dict(size=8)  # smaller x-axis font
    ),
    yaxis=dict(
        tickfont=dict(size=10)  # smaller y-axis font
    )
)

fig.show()

Similar to the carrier data, the airport data also has some inconsistencies that should be considered before conducting trend analysis or forecasting:

  • Airports like PVD in Providence, RI, and LAS Las Vegas show different airports but are actually the same airport. In the case of LAS, it is a common mistake to use the airport code for McCarran International Airport, which is now Harry Reid International Airport.

For this analysis, we will only consider flights from airports with high flight activity and a consistent presence across the dataset, e.g. LAS, LGA, MCI, etc.

Note: The plotly above is zoomed to showcase similar issues in the airport data, you can click on the Zoom controls and explore other airports.

Popular Carriers¶

To make this analysis more focused, we will only consider the carriers with the most consistent appearance in the dataset, i.e. the ones that appeared from the start until the end of the dataset.

In [11]:
# Step 1: Total number of (year, month) combinations
total_months = df[['year', 'month']].drop_duplicates().shape[0]

# Step 2: Find in which year/month each carrier is present
carrier_presence = df.groupby(['carrier', 'year', 'month'])['arr_flights'].sum().reset_index()

# Step 3: Count active months for each carrier
carrier_months_present = (
    carrier_presence.groupby(['carrier'])
    .size()
    .reset_index(name='active_months')
)

# Step 4: Calculate consistency (% of total months)
carrier_months_present['consistency_pct'] = (
    carrier_months_present['active_months'] / total_months * 100
)

# Step 5: Total number of flights per carrier
total_flights = (
    df.groupby(['carrier'])['arr_flights']
    .sum()
    .reset_index(name='total_flights')
)

# Step 6: Merge and sort
carrier_stats = carrier_months_present.merge(total_flights, on=['carrier'])
carrier_stats = carrier_stats.sort_values(by='total_flights', ascending=False).reset_index(drop=True)

# Optional: display top 10
print(carrier_stats.head(10))
  carrier  active_months  consistency_pct  total_flights
0      WN            261       100.000000     25921109.0
1      DL            261       100.000000     16368925.0
2      AA            261       100.000000     15571282.0
3      OO            261       100.000000     13666764.0
4      UA            261       100.000000     11249820.0
5      MQ            237        90.804598      7659466.0
6      EV            208        79.693487      6494029.0
7      US            145        55.555556      5188612.0
8      B6            261       100.000000      4738419.0
9      AS            261       100.000000      3901738.0

We can see the top 5 carriers with 100% flight activity in the dataset are:

Carrier Code Carrier Name
WN Southwest Airlines
DL Delta Air Lines
AA American Airlines
OO SkyWest Airlines
UA United Airlines

These carriers will be used for this analysis.

Data Wrangling: Filtering¶

We will filter the dataset to include only the following:

  • Years: 2005 onwards
  • Top 5 Carriers that appeared across all time periods: WN, DL, AA, OO, UA
In [12]:
df_clean = df[(df['carrier'].isin(['WN', 'DL', 'AA', 'OO', 'UA'])) & (df['year'] >= 2005)]
print(f"Original Rows {df.shape[0]:,}, Cleaned Rows {df_clean.shape[0]:,} ({(df_clean.shape[0] / df.shape[0]) * 100:.2f}% of original)")
Original Rows 400,118, Cleaned Rows 138,196 (34.54% of original)

Monthly stats¶

We can now start to explore the monthly stats of the dataset using the filtered dataset.

In [13]:
monthly_stats = (
    df_clean.groupby(['year', 'month'])
    .agg(
        n_rows=('airport', 'count'),
        n_airports=('airport', pd.Series.nunique),
        n_carriers=('carrier', pd.Series.nunique),
        total_arr_flights=('arr_flights', 'sum'),
        total_arr_del15=('arr_del15', 'sum'),
        total_arr_delay=('arr_delay', 'sum'),
        avg_arr_delay=('arr_delay', 'mean'),
        max_arr_delay=('arr_delay', 'max'),
        avg_carrier_delay=('carrier_delay', 'mean'),
        avg_weather_delay=('weather_delay', 'mean'),
        avg_nas_delay=('nas_delay', 'mean'),
        avg_security_delay=('security_delay', 'mean'),
        avg_late_aircraft_delay=('late_aircraft_delay', 'mean'),
        total_cancelled=('arr_cancelled', 'sum'),
        total_diverted=('arr_diverted', 'sum')
    )
    .reset_index()
)

# Derived metrics
monthly_stats['pct_delayed'] = (
    monthly_stats['total_arr_del15'] / monthly_stats['total_arr_flights']
)
monthly_stats['cancel_rate'] = (
    monthly_stats['total_cancelled'] / monthly_stats['total_arr_flights']
)
monthly_stats['divert_rate'] = (
    monthly_stats['total_diverted'] / monthly_stats['total_arr_flights']
)

# Create a datetime column for monthly x-axis
monthly_stats['date'] = pd.to_datetime(
    monthly_stats[['year', 'month']].assign(day=1)
)


display(monthly_stats)
year month n_rows n_airports n_carriers total_arr_flights total_arr_del15 total_arr_delay avg_arr_delay max_arr_delay avg_carrier_delay avg_weather_delay avg_nas_delay avg_security_delay avg_late_aircraft_delay total_cancelled total_diverted pct_delayed cancel_rate divert_rate date
0 2005 1 457 198 5 284514.0 66785.0 3600500.0 7878.555799 197239.0 2041.179431 589.183807 2236.919037 10.877462 3000.396061 10656.0 888.0 0.234734 0.037453 0.003121 2005-01-01
1 2005 2 453 196 5 261124.0 49000.0 2375416.0 5243.743929 221659.0 1366.306843 193.902870 1691.086093 6.485651 1985.962472 3491.0 368.0 0.187650 0.013369 0.001409 2005-02-01
2 2005 3 455 196 5 291972.0 56796.0 2957964.0 6501.019780 332144.0 1719.316484 255.615385 2000.903297 15.094505 2510.090110 3461.0 339.0 0.194526 0.011854 0.001161 2005-03-01
3 2005 4 462 197 5 280165.0 37835.0 1719410.0 3721.666667 169481.0 1154.965368 136.822511 1057.541126 7.158009 1365.179654 3167.0 306.0 0.135045 0.011304 0.001092 2005-04-01
4 2005 5 446 193 5 288952.0 40307.0 1995596.0 4474.430493 143180.0 1310.506726 168.795964 1265.957399 4.589686 1724.580717 2333.0 374.0 0.139494 0.008074 0.001294 2005-05-01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
237 2024 10 728 279 5 431823.0 55552.0 3374479.0 4635.273352 163335.0 2114.288462 159.078297 651.013736 7.344780 1703.548077 3770.0 503.0 0.128645 0.008730 0.001165 2024-10-01
238 2024 11 724 278 5 399791.0 58548.0 3722939.0 5142.180939 206051.0 2015.313536 310.917127 1035.198895 7.893646 1772.857735 2004.0 629.0 0.146447 0.005013 0.001573 2024-11-01
239 2024 12 729 276 5 407919.0 82360.0 5864330.0 8044.348422 303265.0 3021.779150 637.676269 1291.849108 10.969822 3082.074074 2629.0 1117.0 0.201903 0.006445 0.002738 2024-12-01
240 2025 1 721 273 5 383734.0 67648.0 4839843.0 6712.680999 305779.0 2626.022191 633.302358 1247.434119 8.276006 2197.646325 10477.0 860.0 0.176289 0.027303 0.002241 2025-01-01
241 2025 2 712 273 5 358818.0 69334.0 4890789.0 6869.085674 219491.0 2578.532303 489.320225 1420.228933 6.971910 2374.032303 2858.0 674.0 0.193229 0.007965 0.001878 2025-02-01

242 rows × 21 columns

In [14]:
plot_line(
    data=monthly_stats,
    x='date',
    y='pct_delayed',
    title='% of Flights Delayed (15+ min) Over Time',
    ylabel='Percent Delayed'
)
No description has been provided for this image
  • Delay rates increased steadily from 2003 to around 2008, likely due to rising air traffic and limited infrastructure.
  • From 2008 to 2015, delay rates showed high volatility but no consistent trend, possibly due to airline mergers and operational adjustments.
  • Clear seasonal patterns appear each year, with delays peaking in winter and summer months, indicating strong seasonality.
  • A sharp decline in delay percentage occurred in early 2020 due to the COVID-19 pandemic and associated drop in flight volume.
  • Post-2020, delay rates gradually returned with noticeable fluctuations, suggesting an uneven recovery and potential operational strain.
  • The current (2023–2024) delay levels have reached pre-COVID variability, showing a full rebound in system usage and disruption patterns.
In [15]:
plot_line(
    data=monthly_stats,
    x='date',
    y='avg_arr_delay',
    title='Average Arrival Delay Over Time',
    ylabel='Avg Delay (minutes)'
)
No description has been provided for this image
In [16]:
plt.figure(figsize=(14, 6))
delay_cols = {
    'avg_carrier_delay': 'Carrier',
    'avg_weather_delay': 'Weather',
    'avg_nas_delay': 'NAS',
    'avg_security_delay': 'Security',
    'avg_late_aircraft_delay': 'Late Aircraft'
}

for col, label in delay_cols.items():
    sns.lineplot(data=monthly_stats, x='date', y=col, label=label, linewidth=2.5)

plt.title('Average Delay by Cause Over Time', fontsize=18, fontweight='bold')
plt.ylabel('Delay (minutes)')
plt.xlabel('')
plt.legend(title='Delay Cause')
plt.xticks(rotation=45)
sns.despine()
plt.tight_layout()
plt.show()
No description has been provided for this image
  • Late aircraft delays consistently cause the highest average monthly delay, making it the dominant contributor across all years.
  • Carrier-related delays are the second most significant cause, closely following late aircraft delays in trend and seasonality.
  • NAS (National Aviation System) delays also contribute significantly, with moderate seasonal fluctuations, and appear to increase again post-2020.
  • Weather delays are relatively low but highly seasonal, with visible spikes likely aligned with winter storms or summer disruptions.
  • Security delays are negligible across the entire timeline and show minimal variance, indicating they have little impact on total delays.
  • A steep drop in all delay causes occurred in early 2020, reflecting the COVID-19 impact — followed by a sharp recovery and increased variability in 2022–2024.
  • Post-COVID, late aircraft and carrier delays show more volatility and sharper spikes, suggesting growing operational strain or irregularities in flight schedules.

Time Series Analysis¶

We can now start to analyze the time series data using the filtered dataset.

Simple Moving Average¶

In [17]:
# Step 1: Aggregate total delay across all carriers and airports by month
monthly_overall_delay = (
    df_clean.groupby(['year', 'month'])['arr_delay']
    .sum()
    .reset_index()
    .sort_values(['year', 'month'])
)

# Step 2: Create a proper date column for sorting and plotting
monthly_overall_delay['date'] = pd.to_datetime(
    monthly_overall_delay[['year', 'month']].assign(day=1)
)

# Compute 5-month SMA and residuals
monthly_overall_delay['SMA'] = monthly_overall_delay['arr_delay'].rolling(window=5).mean()
monthly_overall_delay['diff'] = monthly_overall_delay['arr_delay'] - monthly_overall_delay['SMA']

# Line chart: arr_delay vs SMA
fig1 = go.Figure()
fig1.add_trace(go.Scatter(
    x=monthly_overall_delay['date'], y=monthly_overall_delay['arr_delay'],
    mode='lines', name='Arrival Delay'
))
fig1.add_trace(go.Scatter(
    x=monthly_overall_delay['date'], y=monthly_overall_delay['SMA'],
    mode='lines', name='5-month SMA'
))
fig1.update_layout(
    title='Monthly Arrival Delay vs 10-Month SMA',
    xaxis_title='Date',
    yaxis_title='Total Arrival Delay (minutes)',
    template='plotly_white',
    height=500
)
fig1.show()
In [18]:
fig2 = px.histogram(
    monthly_overall_delay,
    x='diff',
    nbins=30,
    title='Distribution of Delay Residuals (Actual - SMA)',
    labels={'diff': 'Residual (minutes)'},
    template='plotly_white'
)
fig2.update_layout(
    xaxis_title='Delay Residual',
    yaxis_title='Frequency',
    bargap=0.05
)
fig2.show()

Based on the graph, it shows that the majority of delays are within the range of -1M to 1M minutes ever month.

In [19]:
monthly_overall_delay['upper'] = monthly_overall_delay['SMA'] + 1000000
monthly_overall_delay['lower'] = monthly_overall_delay['SMA'] - 1000000

fig = go.Figure()

# Actual data points
fig.add_trace(go.Scatter(
    x=monthly_overall_delay['date'],
    y=monthly_overall_delay['arr_delay'],
    mode='markers',
    marker=dict(size=4, color='green'),
    name='Actual'
))

# SMA line
fig.add_trace(go.Scatter(
    x=monthly_overall_delay['date'],
    y=monthly_overall_delay['SMA'],
    mode='lines',
    line=dict(color='blue'),
    name='SMA'
))

# Upper bound (invisible, used to create shaded area)
fig.add_trace(go.Scatter(
    x=monthly_overall_delay['date'],
    y=monthly_overall_delay['upper'],
    line=dict(width=0),
    showlegend=False,
    hoverinfo='skip'
))

# Lower bound with fill to upper bound
fig.add_trace(go.Scatter(
    x=monthly_overall_delay['date'],
    y=monthly_overall_delay['lower'],
    fill='tonexty',
    fillcolor='rgba(255, 0, 0, 0.2)',
    line=dict(width=0),
    name='Prediction Interval',
    hoverinfo='skip'
))

fig.update_layout(
    title="Arrival Delay with SMA and Prediction Interval",
    xaxis_title="Date",
    yaxis_title="Total Arrival Delay (minutes)",
    template='plotly_white',
    height=500
)

fig.show()

We can see that the majority of 'anomalies' are happening recently since Covid-19. This is understandable since the aviation industry is still recovering from the pandemic and there are still some irregularities in the flight schedules.

Performance Evaluation¶

In [20]:
valid_bounds = monthly_overall_delay.dropna(subset=['arr_delay', 'lower', 'upper'])

# Calculate how many points are outside the bounds
outside_bounds = valid_bounds[
    (valid_bounds['arr_delay'] < valid_bounds['lower']) |
    (valid_bounds['arr_delay'] > valid_bounds['upper'])
]

# Output the results
n_total = len(valid_bounds)
n_outside = len(outside_bounds)
pct_outside = n_outside / n_total * 100

print(f"Total valid rows: {n_total}")
print(f"Rows outside SMA bounds: {n_outside} ({pct_outside:.2f}%)")

valid_sma = monthly_overall_delay.dropna(subset=['arr_delay', 'SMA'])


mape_sma = mean_absolute_percentage_error(valid_sma['arr_delay'], valid_sma['SMA'])
mse_sma = mean_squared_error(valid_sma['arr_delay'], valid_sma['SMA'])

print(f"MAPE (SMA model): {mape_sma*100:.2f}%")
print(f"MSE (SMA model): {mse_sma:,.0f}")
Total valid rows: 238
Rows outside SMA bounds: 74 (31.09%)
MAPE (SMA model): 28.66%
MSE (SMA model): 1,088,855,420,221

Exponential Smoothing¶

In [21]:
# Fit Exponential Smoothing model
ema_model = SimpleExpSmoothing(monthly_overall_delay['arr_delay']).fit(smoothing_level=0.2, optimized=False)

# Predict full in-sample values
monthly_overall_delay['EMA'] = ema_model.fittedvalues
monthly_overall_delay['diff'] = monthly_overall_delay['arr_delay'] - monthly_overall_delay['EMA']

# Forecast the next 3 months 
forecast_3 = ema_model.forecast(3)


fig1 = go.Figure()
fig1.add_trace(go.Scatter(x=monthly_overall_delay['date'], y=monthly_overall_delay['arr_delay'],
                          mode='lines', name='Actual'))
fig1.add_trace(go.Scatter(x=monthly_overall_delay['date'], y=monthly_overall_delay['EMA'],
                          mode='lines', name='EMA (α=0.2)'))
fig1.update_layout(title="Actual vs EMA", xaxis_title="Date", yaxis_title="Arrival Delay", template="plotly_white")
fig1.show()

fig2 = px.histogram(monthly_overall_delay, x='diff', nbins=30, title='Distribution of Residuals (Actual - EMA)',
                    labels={'diff': 'Residuals'}, template='plotly_white')
fig2.update_layout(bargap=0.05)
fig2.show()

monthly_overall_delay['ema_upper'] = monthly_overall_delay['EMA'] + 1000000
monthly_overall_delay['ema_lower'] = monthly_overall_delay['EMA'] - 1000000

fig3 = go.Figure()
fig3.add_trace(go.Scatter(x=monthly_overall_delay['date'], y=monthly_overall_delay['arr_delay'],
                          mode='markers', name='Actual', marker=dict(size=4, color='green')))
fig3.add_trace(go.Scatter(x=monthly_overall_delay['date'], y=monthly_overall_delay['EMA'],
                          mode='lines', name='EMA', line=dict(color='blue')))
fig3.add_trace(go.Scatter(x=monthly_overall_delay['date'], y=monthly_overall_delay['ema_upper'],
                          line=dict(width=0), showlegend=False))
fig3.add_trace(go.Scatter(x=monthly_overall_delay['date'], y=monthly_overall_delay['ema_lower'],
                          fill='tonexty', fillcolor='rgba(255,0,0,0.2)', line=dict(width=0),
                          name='Prediction Interval'))
fig3.update_layout(title="Arrival Delay with EMA and Interval", template="plotly_white")
fig3.show()

Similar to the Simple Moving Average, if we define the average forecasted delays are normally within the range of -1M to 1M minutes every month, the Exponential Smoothing graph shows that since Covid-19 there are a lot of delays considered anomalies. Additionally, a simple count of anomalies (forecasted delays outside the range of -1M to 1M minutes) shows that using Exponential Smoothing, there are more anomalies delays than using Simple Moving Average.

Performance Evaluation¶

In [22]:
# Ensure no nulls in the relevant columns
valid_bounds = monthly_overall_delay.dropna(subset=['arr_delay', 'ema_lower', 'ema_upper'])

# Calculate how many points are outside the bounds
outside_bounds = valid_bounds[
    (valid_bounds['arr_delay'] < valid_bounds['ema_lower']) |
    (valid_bounds['arr_delay'] > valid_bounds['ema_upper'])
]

# Output the results
n_total = len(valid_bounds)
n_outside = len(outside_bounds)
pct_outside = n_outside / n_total * 100

print(f"Total valid rows: {n_total}")
print(f"Rows outside EMA bounds: {n_outside} ({pct_outside:.2f}%)")

# Drop rows where EMA is missing (should only be at the beginning if any)
valid_ema = monthly_overall_delay.dropna(subset=['arr_delay', 'EMA'])

# Calculate MAPE and MSE
mape_ema = mean_absolute_percentage_error(valid_ema['arr_delay'], valid_ema['EMA'])
mse_ema = mean_squared_error(valid_ema['arr_delay'], valid_ema['EMA'])

print(f"MAPE (EMA model): {mape_ema*100:.2f}%")
print(f"MSE (EMA model): {mse_ema:,.0f}")
Total valid rows: 242
Rows outside EMA bounds: 91 (37.60%)
MAPE (EMA model): 36.84%
MSE (EMA model): 1,465,865,459,636

Seasonal-Trend Decomposition (STD)¶

In [23]:
# Make sure date is datetime and sorted
monthly_overall_delay = monthly_overall_delay.sort_values('date')
monthly_overall_delay.set_index('date', inplace=True)

# Decompose with additive model
decomposition = sm.tsa.seasonal_decompose(monthly_overall_delay['arr_delay'], model='additive', period=12) # since our data is monthly, one full cycle is 12 months

# Extract components
trend = decomposition.trend
seasonal = decomposition.seasonal
resid = decomposition.resid

# Create Plotly subplots
fig = make_subplots(rows=3, cols=1, shared_xaxes=True,
                    vertical_spacing=0.03,
                    subplot_titles=("Trend", "Seasonality", "Residual"))

fig.add_trace(go.Scatter(x=trend.index, y=trend, mode='lines', name='Trend'), row=1, col=1)
fig.add_trace(go.Scatter(x=seasonal.index, y=seasonal, mode='lines', name='Seasonal'), row=2, col=1)
fig.add_trace(go.Scatter(x=resid.index, y=resid, mode='lines', name='Residual'), row=3, col=1)

fig.update_layout(
    height=700,
    title_text="Seasonal-Trend Decomposition (Additive)",
    template="plotly_white",
    showlegend=False
)

fig.show()

Below are some insights from the Seasonal-Trend Decomposition (STD) analysis:

  • Trend shows a steady increase in total arrival delays from 2005 to 2019
  • A sharp drop in delays is observed around 2020 due to the COVID-19 pandemic
  • After 2020, delays rise again and surpass pre-pandemic levels by 2024
  • Seasonality is consistent with clear annual cycles
  • Peaks in delays occur mid-year, likely during summer travel months
  • Troughs occur during winter months when travel is lower
  • Residuals are mostly stable across time
  • Large residual spikes appear since 2020, indicating abnormal conditions, i.e. Covid-19 impact

We will now continue using the Prophet module to model the time series data and forecast future delays. But to do that, we would want to add some additional features to the dataset to improve the model's accuracy.

Performance Evaluation¶

In [24]:
# Reconstruct fitted values
fitted = trend + seasonal

# Drop NaNs (common in trend edges)
eval_df = monthly_overall_delay[['arr_delay']].copy()
eval_df['fitted'] = fitted
eval_df = eval_df.dropna()

# Compute errors
mse = mean_squared_error(eval_df['arr_delay'], eval_df['fitted'])
mape = mean_absolute_percentage_error(eval_df['arr_delay'], eval_df['fitted'])

# Define static margin (e.g., ±1.5 million)
margin = 1_000_000 # same boundary as used in EMA
eval_df['upper'] = eval_df['fitted'] + margin
eval_df['lower'] = eval_df['fitted'] - margin

# Count how many points fall outside the bounds
outside_bounds = eval_df[
    (eval_df['arr_delay'] > eval_df['upper']) | 
    (eval_df['arr_delay'] < eval_df['lower'])
]

n_total = len(eval_df)
n_outside = len(outside_bounds)
pct_outside = n_outside / n_total * 100

# Display results
print(f"STD model performance:")
print(f"- MAPE: {mape:.2%}")
print(f"- MSE: {mse:,.2f}")
print(f"- Points outside ±{margin:,} bounds: {n_outside} out of {n_total} ({pct_outside:.2f}%)")
STD model performance:
- MAPE: 21.00%
- MSE: 462,049,177,386.65
- Points outside ±1,000,000 bounds: 25 out of 230 (10.87%)

Data Wrangling: Add Features¶

1. Holidays per month¶

In [25]:
# Define the date range
start_date = '2005-01-01'
end_date = '2025-05-31'

# Create calendar and get holidays in that range
calendar = USFederalHolidayCalendar()
holidays = calendar.holidays(start=start_date, end=end_date)

# Convert to DataFrame with year and month
holidays_df = pd.DataFrame({'date': holidays})
holidays_df['year'] = holidays_df['date'].dt.year
holidays_df['month'] = holidays_df['date'].dt.month

# Count holidays by (year, month)
monthly_flags = (
    holidays_df.groupby(['year', 'month'])
    .size()
    .reset_index(name='num_holidays')
)

# Fill in missing (year, month) pairs with 0 holidays
all_months = pd.MultiIndex.from_product(
    [range(2005, 2026), range(1, 13)],
    names=['year', 'month']
)

monthly_flags = (
    monthly_flags
    .set_index(['year', 'month'])
    .reindex(all_months, fill_value=0)
    .reset_index()
)

# Preview
print(monthly_flags.head(12))  # first year
    year  month  num_holidays
0   2005      1             1
1   2005      2             1
2   2005      3             0
3   2005      4             0
4   2005      5             1
5   2005      6             0
6   2005      7             1
7   2005      8             0
8   2005      9             1
9   2005     10             1
10  2005     11             2
11  2005     12             1

2. Holiday/travel season flag¶

  • Summer travel: June, July, August

  • Holiday travel: November (Thanksgiving), December (Christmas/New Year)

Source: TSA Throughput Data, DOT Bureau of Transportation Statistics

In [26]:
monthly_flags['travel_season_flag'] = monthly_flags['month'].isin([6, 7, 8, 11, 12]).astype(int)
# Preview
print(monthly_flags.head(12))  # first year
    year  month  num_holidays  travel_season_flag
0   2005      1             1                   0
1   2005      2             1                   0
2   2005      3             0                   0
3   2005      4             0                   0
4   2005      5             1                   0
5   2005      6             0                   1
6   2005      7             1                   1
7   2005      8             0                   1
8   2005      9             1                   0
9   2005     10             1                   0
10  2005     11             2                   1
11  2005     12             1                   1

3. Disaster flag¶

Flag for major disasters (e.g., hurricanes, wildfires) that could impact air travel

In [27]:
monthly_flags['date'] = pd.to_datetime(monthly_flags[['year', 'month']].assign(day=1))

# 1. COVID-19: March 2020 to March 2022
monthly_flags['covid_flag'] = (
    (monthly_flags['date'] >= '2020-03-01') & (monthly_flags['date'] <= '2025-05-01')
).astype(int)

# 2. Recession (Dec 2007 – Jun 2009)
monthly_flags['recession_flag'] = (
    (monthly_flags['date'] >= '2007-12-01') & (monthly_flags['date'] <= '2009-06-01')
).astype(int)

# 3. Fuel spike periods
fuel_spike_periods = [
    ('2008-04-01', '2008-10-01'),
    ('2012-02-01', '2012-04-01'),
    ('2022-03-01', '2022-07-01')
]
monthly_flags['fuel_spike_flag'] = 0
for start, end in fuel_spike_periods:
    monthly_flags.loc[
        (monthly_flags['date'] >= start) & (monthly_flags['date'] <= end),
        'fuel_spike_flag'
    ] = 1

# 4. Natural disaster months
natural_disaster_months = [
    '2005-08',  # Hurricane Katrina
    '2008-09',  # Hurricane Ike
    '2011-08',  # Hurricane Irene
    '2012-10',  # Hurricane Sandy
    '2017-08',  # Hurricane Harvey
    '2017-09',  # Hurricane Irma
    '2018-09',  # Florence
    '2018-10',  # Michael
    '2020-08',  # Laura
    '2021-02',  # Texas Snowstorm
    '2021-08',  # Ida
    '2022-09',  # Hurricane Ian
    "2025-03",  # 2025 Tornado Outbreak
    "2025-06",  # 2025 Southern California Wildfires
    "2024-09",  # 2024 Hurricane Helene
    "2023-08",  # 2023 Hawaii Wildfires
    "2023-08"   # 2023 Hurricane Idalia
]

monthly_flags['natural_disaster_flag'] = (
    monthly_flags['date'].dt.strftime('%Y-%m').isin(natural_disaster_months)
).astype(int)

# Drop date column if not needed
monthly_flags.drop(columns='date', inplace=True)

# Preview
print(monthly_flags.head())
   year  month  num_holidays  travel_season_flag  covid_flag  recession_flag  \
0  2005      1             1                   0           0               0   
1  2005      2             1                   0           0               0   
2  2005      3             0                   0           0               0   
3  2005      4             0                   0           0               0   
4  2005      5             1                   0           0               0   

   fuel_spike_flag  natural_disaster_flag  
0                0                      0  
1                0                      0  
2                0                      0  
3                0                      0  
4                0                      0  

Prophet¶

In [28]:
# Step 1: Prepare data for Prophet
# Prophet expects columns named 'ds' for date and 'y' for target
prophet_df = monthly_overall_delay.reset_index()[['date', 'arr_delay']].rename(columns={
    'date': 'ds',
    'arr_delay': 'y'
})

# Step 2: Fit Prophet model
model = Prophet(
    yearly_seasonality=True,
    daily_seasonality=False,
    weekly_seasonality=False
)
model.fit(prophet_df)

# Step 3: Create future dataframe (e.g., 12 months ahead)
future = model.make_future_dataframe(periods=6, freq='MS')  # 'MS' = month start
forecast = model.predict(future)

# Step 4: Plot forecast (matplotlib)
fig1 = model.plot(forecast)

# Step 5: Plot components (trend, seasonality, etc.)
fig2 = model.plot_components(forecast)
21:47:53 - cmdstanpy - INFO - Chain [1] start processing
21:47:53 - cmdstanpy - INFO - Chain [1] done processing
No description has been provided for this image
No description has been provided for this image

The trend component shows a steady increase in arrival delays from 2005 onward, with a noticeable acceleration after 2020.

The seasonality component reveals a strong yearly cycle, with delays typically peaking around July and dipping sharply in November, consistent with summer travel surges and reduced traffic during late fall.

Overall, Prophet captures both the long-term growth in delays and recurring annual patterns, supporting the presence of consistent seasonal effects in flight delay behavior.

Performance evaluation¶

In [29]:
# Merge actuals and forecast on 'ds'
eval_df = forecast.merge(prophet_df[['ds', 'y']], on='ds', how='left')

# Drop rows with missing actuals (only keep rows within training period)
eval_df = eval_df.dropna(subset=['y'])

# Count how many actual points fall outside the prediction interval
outside_bounds = eval_df[
    (eval_df['y'] < eval_df['yhat_lower']) |
    (eval_df['y'] > eval_df['yhat_upper'])
]

# Summary stats
n_total = len(eval_df)
n_outside = len(outside_bounds)
pct_outside = n_outside / n_total * 100

print(f"Total points evaluated: {n_total}")
print(f"Points outside prediction interval: {n_outside} ({pct_outside:.2f}%)")



# Merge actuals and predictions on date
eval_df = prophet_df[['ds', 'y']].merge(
    forecast[['ds', 'yhat']], on='ds', how='left'
)

# Drop rows with missing predictions (e.g., if yhat doesn't cover all ds)
eval_df = eval_df.dropna(subset=['y', 'yhat'])

# Calculate metrics
mape = mean_absolute_percentage_error(eval_df['y'], eval_df['yhat'])
mse = mean_squared_error(eval_df['y'], eval_df['yhat'])

print(f"MAPE: {mape:.4f}")
print(f"MSE: {mse:,.0f}")
Total points evaluated: 242
Points outside prediction interval: 34 (14.05%)
MAPE: 0.3940
MSE: 1,147,133,606,165

Prophet with Additional Features¶

In [30]:
prophet_df = monthly_overall_delay.reset_index().merge(
    monthly_flags,
    on=['year', 'month'],
    how='left'
)

prophet_df = prophet_df.rename(columns={
    'date': 'ds',
    'arr_delay': 'y'
})[['ds', 'y', 'num_holidays', 'travel_season_flag', 'covid_flag',
    'recession_flag', 'fuel_spike_flag', 'natural_disaster_flag']]

model = Prophet(yearly_seasonality=True, daily_seasonality=False, weekly_seasonality=False)

# Add all external regressors
model.add_regressor('num_holidays')
model.add_regressor('travel_season_flag')
model.add_regressor('covid_flag')
model.add_regressor('recession_flag')
model.add_regressor('fuel_spike_flag')
model.add_regressor('natural_disaster_flag')

model.fit(prophet_df)

future = model.make_future_dataframe(periods=6, freq='MS')

future = future.merge(monthly_flags,
                      left_on=[future.ds.dt.year, future.ds.dt.month],
                      right_on=['year', 'month'],
                      how='left')

future = future[['ds', 'num_holidays', 'travel_season_flag', 'covid_flag',
                 'recession_flag', 'fuel_spike_flag', 'natural_disaster_flag']]

forecast = model.predict(future)

plot_plotly(model, forecast)
21:47:54 - cmdstanpy - INFO - Chain [1] start processing
21:47:54 - cmdstanpy - INFO - Chain [1] done processing
In [31]:
plot_components_plotly(model, forecast)

Performance Evaluation¶

In [32]:
# Merge actuals and forecast on 'ds'
eval_df = forecast.merge(prophet_df[['ds', 'y']], on='ds', how='left')

# Drop rows with missing actuals (only keep rows within training period)
eval_df = eval_df.dropna(subset=['y'])

# Count how many actual points fall outside the prediction interval
outside_bounds = eval_df[
    (eval_df['y'] < eval_df['yhat_lower']) |
    (eval_df['y'] > eval_df['yhat_upper'])
]

# Summary stats
n_total = len(eval_df)
n_outside = len(outside_bounds)
pct_outside = n_outside / n_total * 100

print(f"Total points evaluated: {n_total}")
print(f"Points outside prediction interval: {n_outside} ({pct_outside:.2f}%)")

# Merge actuals and predictions on date
eval_df = prophet_df[['ds', 'y']].merge(
    forecast[['ds', 'yhat']], on='ds', how='left'
)

# Drop rows with missing predictions (e.g., if yhat doesn't cover all ds)
eval_df = eval_df.dropna(subset=['y', 'yhat'])

# Calculate metrics
mape = mean_absolute_percentage_error(eval_df['y'], eval_df['yhat'])
mse = mean_squared_error(eval_df['y'], eval_df['yhat'])

print(f"MAPE: {mape:.4f}")
print(f"MSE: {mse:,.0f}")
Total points evaluated: 242
Points outside prediction interval: 44 (18.18%)
MAPE: 0.2812
MSE: 710,579,125,884

Conclusion¶


Model Performance Summary

Model Valid Points Outside Boundaries (%) MAPE (%) MSE (×10¹²)
Simple Moving Average 238 74 (31.09%) 28.66% 1.089
Exponential Moving Avg 242 91 (37.60%) 36.84% 1.466
Seasonal-Trend Decomp. 230 25 (10.87%) 21.00% 0.462
Prophet (baseline) 242 34 (14.05%) 39.40% 1.147
Prophet (+ features) 242 43 (17.77%) 28.12% 0.711

Across the evaluated time series models, the Seasonal-Trend Decomposition (STD) model emerged as the strongest performer, achieving the lowest MSE and MAPE while maintaining tight prediction bounds (only 10.87% of points fell outside the interval). The Simple Moving Average (SMA) model performed moderately but lacked responsiveness to trends, while Exponential Moving Average (EMA) showed high error and the widest prediction gaps. The Prophet model improved considerably when enhanced with relevant external features, reducing its MSE by ~38% and MAPE by over 11 percentage points, though it still exhibited a higher percentage of boundary violations. These results highlight that incorporating domain-specific seasonality and structure (as in STD) or contextual features (as in Prophet+) can significantly improve forecasting accuracy.

All models had difficulties in fitting post Covid-19 data, which is understandable since the aviation industry is still recovering from the pandemic and there are still some irregularities in the flight schedules.

In [ ]: